Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 493
Filtrar
1.
Nat Genet ; 56(4): 721-731, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38622339

RESUMO

Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica.


Assuntos
Coffea , Coffea/genética , Café , Genoma de Planta/genética , Metagenômica , Melhoramento Vegetal
2.
Phys Rev E ; 109(3-1): 034303, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38632720

RESUMO

Graphs have become widely used to represent and study social, biological, and technological systems. Statistical methods to analyze empirical graphs were proposed based on the graph's spectral density. However, their running time is cubic in the number of vertices, precluding direct application to large instances. Thus, efficient algorithms to calculate the spectral density become necessary. For sparse graphs, the cavity method can efficiently approximate the spectral density of locally treelike undirected and directed graphs. However, it does not apply to most empirical graphs because they have heterogeneous structures. Thus, we propose methods for undirected and directed graphs with heterogeneous structures using a new vertex's neighborhood definition and the cavity approach. Our methods' time and space complexities are O(|E|h_{max}^{3}t) and O(|E|h_{max}^{2}t), respectively, where |E| is the number of edges, h_{max} is the size of the largest local neighborhood of a vertex, and t is the number of iterations required for convergence. We demonstrate the practical efficacy by estimating the spectral density of simulated and real-world undirected and directed graphs.

3.
RNA Biol ; 21(1): 1-12, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38528797

RESUMO

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.


Assuntos
Aprendizado Profundo , RNA não Traduzido/genética , Algoritmos , RNA , Redes Neurais de Computação
5.
bioRxiv ; 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38260273

RESUMO

Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.

6.
Mol Ecol Resour ; 24(2): e13904, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37994269

RESUMO

Several computational frameworks and workflows that recover genomes from prokaryotes, eukaryotes and viruses from metagenomes exist. Yet, it is difficult for scientists with little bioinformatics experience to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage of genomes belonging to different domains. MuDoGeR is a user-friendly tool tailored for those familiar with Unix command-line environment that makes it easy to recover genomes of prokaryotes, eukaryotes and viruses from metagenomes, either alone or in combination. We tested MuDoGeR using 24 individual-isolated genomes and 574 metagenomes, demonstrating the applicability for a few samples and high throughput. While MuDoGeR can recover eukaryotic viral sequences, its characterization is predominantly skewed towards bacterial and archaeal viruses, reflecting the field's current state. However, acting as a dynamic wrapper, the MuDoGeR is designed to constantly incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field. MuDoGeR is open-source software available at https://github.com/mdsufz/MuDoGeR. Additionally, MuDoGeR is also available as a Singularity container.


Assuntos
Metagenoma , Vírus , Metagenômica , Software , Bactérias/genética , Filogenia , Vírus/genética
7.
Front Bioinform ; 3: 1322477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38152702

RESUMO

Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.

8.
Algorithms Mol Biol ; 18(1): 16, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37940998

RESUMO

BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

9.
ACS Chem Biol ; 18(12): 2441-2449, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37962075

RESUMO

The chemical biology of native nucleic acid modifications has seen an intense upswing, first concerning DNA modifications in the field of epigenetics and then concerning RNA modifications in a field that was correspondingly rebaptized epitranscriptomics by analogy. The German Research Foundation (DFG) has funded several consortia with a scientific focus in these fields, strengthening the traditionally well-developed nucleic acid chemistry community and inciting it to team up with colleagues from the life sciences and data science to tackle interdisciplinary challenges. This Perspective focuses on the genesis, scientific outcome, and downstream impact of the DFG priority program SPP1784 and offers insight into how it fecundated further consortia in the field. Pertinent research was funded from mid-2015 to 2022, including an extension related to the coronavirus pandemic. Despite being a detriment to research activity in general, the pandemic has resulted in tremendously boosted interest in the field of RNA and RNA modifications as a consequence of their widespread and successful use in vaccination campaigns against SARS-CoV-2. Funded principal investigators published over 250 pertinent papers with a very substantial impact on the field. The program also helped to redirect numerous laboratories toward this dynamic field. Finally, SPP1784 spawned initiatives for several funded consortia that continue to drive the fields of nucleic acid modification.


Assuntos
Ácidos Nucleicos , RNA , Epigênese Genética , Biologia
10.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37944046

RESUMO

SUMMARY: RNA molecules play crucial roles in various biological processes. They mediate their function mainly by interacting with other RNAs or proteins. At present, information about these interactions is distributed over different resources, often providing the data in simple tab-delimited formats that differ between the databases. There is no standardized data format that can capture the nature of all these different interactions in detail. AVAILABILITY AND IMPLEMENTATION: Here, we propose the RNA interaction format (RIF) for the detailed representation of RNA-RNA and RNA-Protein interactions and provide reference implementations in C/C++, Python, and JavaScript. RIF is released under licence GNU General Public License version 3 (GNU GPLv3) and is available on https://github.com/RNABioInfo/rna-interaction-format.


Assuntos
RNA , Software , Bases de Dados Factuais , Proteínas
11.
Anim Microbiome ; 5(1): 48, 2023 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-37798675

RESUMO

BACKGROUND: Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS: We manually selected metagenomes associated with non-human animals from SRA and MG-RAST.  Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION: Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .

12.
Evolution ; 77(11): 2378-2391, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37724883

RESUMO

Some selection-based theories propose that genome streamlining, favoring smaller genome sizes, is advantageous in nutritionally limited environments, particularly under P-limitation. To test this prediction, we conducted several experimental evolution trials on clonal populations of a facultatively asexual rotifer that exhibits intraspecific variation in genome size. Most trials showed a rapid decline in clonal diversity, which was accelerated in populations that were initially nonadapted. Populations consisting of three rotifer clones often became monoclonal within a few weeks, while populations starting with 120 clones eroded to 10 multilocus genotypes, of which only five were abundant in higher numbers. While P-limitation affected population growth during the experiments, it did not affect the outcome of clonal competition or the speed at which clonal diversity was lost. Common garden transplant experiments revealed that the evolved populations were better adapted to the experimental conditions than the ancestral controls. However, contrary to expectations, the evolved populations did not show an overrepresentation of small genomes. Intermediate genomes were also frequently abundant, although very large genomes were rare. Our findings suggest that fitness is more influenced by genotypic differences among clones than by differences in GS, and indicate that such differences might hinder genome streamlining during early adaptation to a new environment.


Assuntos
Variação Genética , Tamanho do Genoma , Genótipo
13.
Theory Biosci ; 142(4): 301-358, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37573261

RESUMO

Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets [Formula: see text] of descendant taxa of a vertex v. The clustering system [Formula: see text] comprising the clusters of a network N conveys key information on N itself. In the special case of rooted phylogenetic trees, T is uniquely determined by its clustering system [Formula: see text]. Although this is no longer true for networks in general, it is of interest to relate properties of N and [Formula: see text]. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering systems of the following form: If N is a network of type [Formula: see text], then [Formula: see text] satisfies [Formula: see text], and conversely if [Formula: see text] is a clustering system satisfying [Formula: see text] then there is network N of type [Formula: see text] such that [Formula: see text].This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.

14.
NAR Genom Bioinform ; 5(3): lqad072, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37608800

RESUMO

The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.

15.
J Integr Bioinform ; 20(3)2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37615674

RESUMO

The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.


Assuntos
Genômica , Software , Fases de Leitura Aberta , Alinhamento de Sequência , Biologia Computacional/métodos
16.
Bioinform Adv ; 3(1): vbad069, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37448812

RESUMO

Motivation: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. Results: StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence-absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. Availability and implementation: StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

17.
Nat Methods ; 20(8): 1159-1169, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37443337

RESUMO

The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.


Assuntos
Benchmarking , RNA Circular , Humanos , RNA Circular/genética , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA/métodos
18.
J Bioinform Comput Biol ; 21(4): 2350016, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37522173

RESUMO

Most of the functional RNA elements located within large transcripts are local. Local folding therefore serves a practically useful approximation to global structure prediction. Due to the sensitivity of RNA secondary structure prediction to the exact definition of sequence ends, accuracy can be increased by averaging local structure predictions over multiple, overlapping sequence windows. These averages can be computed efficiently by dynamic programming. Here we revisit the local folding problem, present a concise mathematical formalization that generalizes previous approaches and show that correct Boltzmann samples can be obtained by local stochastic backtracing in McCaskill's algorithms but not from local folding recursions. Corresponding new features are implemented in the ViennaRNA package to improve the support of local folding. Applications include the computation of maximum expected accuracy structures from RNAplfold data and a mutual information measure to quantify the sensitivity of individual sequence positions.


Assuntos
Dobramento de RNA , RNA , Conformação de Ácido Nucleico , RNA/química , Algoritmos , RNA não Traduzido
19.
Algorithms Mol Biol ; 18(1): 8, 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37516881

RESUMO

BACKGROUND: RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled consistently by currently available secondary structure prediction algorithms. Combining Debye-Hückel potentials for line charges and Manning's counter-ion condensation theory, Einert et al. (Biophys J 100: 2745-2753, 2011) modeled the energetic contributions of monovalent cations on loops and helices. RESULTS: The model of Einert et al. is adapted to match the structure of the dynamic programming recursion of RNA secondary structure prediction algorithms. An empirical term describing the salt dependence of the duplex initiation energy is added to improve co-folding predictions for two or more RNA strands. The slightly modified model is implemented in the ViennaRNA package in such way that only the energy parameters but not the algorithmic structure is affected. A comparison with data from the literature show that predicted free energies and melting temperatures are in reasonable agreement with experiments. CONCLUSION: The new feature in the ViennaRNA package makes it possible to study effects of salt concentrations on RNA folding in a systematic manner. Strictly speaking, the model pertains only to mono-valent cations, and thus covers the most important parameter, i.e., the NaCl concentration. It remains a question for future research to what extent unspecific effects of bi- and tri-valent cations can be approximated in a similar manner. AVAILABILITY: Corrections for the concentration of monovalent cations are available in the ViennaRNA package starting from version 2.6.0.

20.
Nat Commun ; 14(1): 3936, 2023 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-37402719

RESUMO

Circular RNAs (circRNAs) are a regulatory RNA class. While cancer-driving functions have been identified for single circRNAs, how they modulate gene expression in cancer is not well understood. We investigate circRNA expression in the pediatric malignancy, neuroblastoma, through deep whole-transcriptome sequencing in 104 primary neuroblastomas covering all risk groups. We demonstrate that MYCN amplification, which defines a subset of high-risk cases, causes globally suppressed circRNA biogenesis directly dependent on the DHX9 RNA helicase. We detect similar mechanisms in shaping circRNA expression in the pediatric cancer medulloblastoma implying a general MYCN effect. Comparisons to other cancers identify 25 circRNAs that are specifically upregulated in neuroblastoma, including circARID1A. Transcribed from the ARID1A tumor suppressor gene, circARID1A promotes cell growth and survival, mediated by direct interaction with the KHSRP RNA-binding protein. Our study highlights the importance of MYCN regulating circRNAs in cancer and identifies molecular mechanisms, which explain their contribution to neuroblastoma pathogenesis.


Assuntos
Neuroblastoma , RNA Circular , Criança , Humanos , RNA Circular/genética , Proteína Proto-Oncogênica N-Myc/genética , Proteína Proto-Oncogênica N-Myc/metabolismo , Linhagem Celular Tumoral , RNA/genética , RNA/metabolismo , Neuroblastoma/metabolismo , Regulação Neoplásica da Expressão Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...